Module 3: Choosing $\beta$, $\alpha_s$, and $\alpha_x$

Now that we have obtained a solution to our inverse problem, there are non-unique aspects that still remain, namely we do not have values for $\alpha_s$, $\alpha_x$, and $\beta$. In this module we are going to go over some of the practical aspects of choosing these values.

Choosing $\beta$

There are several methods that can be used to choose $\beta$. This module will only do one in practice. A good summary of methods for choosing $\beta$ can be found on the Inversion for Applied Geophysics website here: http://www.eos.ubc.ca/ubcgif/iag/index.htm.

In our case we are going to perform our inversion for a range of $\beta$ values and then compute $\phi_d$ and $\phi_m$ for each instance. Making a plot of $\phi_d$ vs. $\phi_m$ yields what is known as the the Tikhonov curve. By looking at the behavior of the curve, one can then select a criteria for choosing the optimal $\beta$ value. In most cases, a desired value for $\phi_d$ is chosen (denoted as $\phi_d^*$), and the $\beta$ value that corresponds to this is considered the appropriate choice. As mentioned briefly in the last module, given that we have assumed Gaussian statistics for our noise model with known standard deviations, then we can expect $\phi_d$ to be approximately equal to the number of data. So what this means is that we are going to set our value for $\phi_d^*$ as the number of data we have and choose the corresponding $\beta$.

It is best, however, to see things in action as we go along, so once again, we are going to recreate the material from the last module and proceed from there. As before, we will begin by importing our packages:



In [2]:

    
# Import Packages
from SimPEG import *
from IPython.html.widgets import interactive
import numpy as np
%pylab inline









    



Efficiency Warning: Interpolation will be slow, use setup.py!

            python setup.py build_ext --inplace
    
Populating the interactive namespace from numpy and matplotlib






    



WARNING: pylab import has clobbered these variables: ['interactive']
`%matplotlib` prevents importing * from pylab and numpy

Setting up the same problem

Next, we will rebuild our model parameters, kernel functions, averaging and volume matrices, produce our synthetic data, add noise, and build the tools we need to calculate $\phi_d$ and $\phi_m$ as before. This time, however, we will avoid plotting them.



In [1]:

    
# Begin by creating a ficticious set of model parameters

M=1000                      # Set number of model parameters 
x=np.linspace(0,1,M+1)      # Define 1D domain on nodes
xc = 0.5*(x[1:] + x[0:-1])  # Define 1D domain on cell centers
m = np.zeros(M)             # preallocate m array 

# Define Gaussian function:
gauss = lambda x, a, mean, SD: a*np.exp(-(x-mean)**2./(2.*SD**2.)) # create a Gaussian function

# Choose parameters for Gaussian, evaluate, and store in an array, f.
SD=6                           
mean=0
a=1/(SD*sqrt(2*pi))
x2=np.linspace(-20,20,0.2*M)  
f = gauss(x2,a, mean,SD)

# Define a boxcar function:
box = 0.4*np.ones(0.2*M)

# Insert the Gaussian and Boxcar into m:
m[0.2*M:0.4*M]=box
m[0.6*M:0.8*M]=10*f
   

# Make the set of kernel functions
g = lambda x, i, p, q: np.exp(-p*i*x)*np.cos(2*np.pi*q*i*x) # create an anonymous function as immediately above
p = 0.01      # Set values for p, q
q = 0.15
N = 20        # specify number of output data
Gn = np.zeros((M+1,N))

for i in range(N):
    f = g(x,i,p,q)
    Gn[:,i] = f
    

# Make Averaging Matrix
n=M+1            # Define n as the N+1 dimension of the matrix   
w=n-1            # Define w and the n-1  dimentsion of the matix   
s = (M,n)        # Store matrix values 
Av = np.zeros(s) # Create a matrix of zeros of the correct dimensions 
                 # and fill in with elements usin the loop below (note the 1/2 is included in here). 
for i in range(M):
    j=i
    k=i+1
    Av[i,j] = 0.5  
    Av[i,k] = 0.5


# make the Volume, "delta x" array
Deltax = diff(x)
V = np.diag(Deltax)     # create diagonal matrix   

# Produce our data
G=np.dot(np.transpose(np.dot(Av, Gn)), V)
d = np.dot(np.dot(np.transpose(np.dot(Av, Gn)), V),m)   # Create synthetic data
xd=np.linspace(0,1,len(d))                              # make x array for data 

# Add noise to our synthetic data

noise = lambda fl, length, data, s: fl + randn(length)*data*s # introduce noise using a floor (f), length (l) and scaling factor (s)

r = noise(0, len(d), d, 0.04)

dobs= d + r

# Calculate the data misfit, phi_d #
####################################

# Anonymous function for 2-Norm
L2 = lambda A, w: dot(dot(w.T,A.T),dot(A,w))  

#This constructs the inverses of the variances. 
invvar = lambda floor, percent, data: 1./(floor + percent*np.abs(data))
                            
# assign a floor
flr = 0.2*dot(d.T,d)**0.5/len(d)

# Make Wd
eps = invvar(flr,0.05,dobs) # define epsilon and Wd
Wd=np.diag(eps) 


# Take the 2-norm
phi_d = L2(Wd, np.dot(G,m)-dobs)

# Reference Model#
##################
## A constant reference model ##
mref = np.mean(m)*np.ones(M)   


# Domains #
###############################                       # Set up domains:
l = np.power(x[1:len(x)]-x[0:len(x)-1],0.5)           # vector of distances between nodes, l
midx=np.dot(Av,x)                                     # vector of midpoints, midx 
h = np.power(midx[1:len(midx)]-x[0:len(midx)-1], 0.5) # Calculate distances between midpoints & take sqr roots of each value in vector h



# Create Ws, Wx matrices
Ws=np.diag(l)                                  # put length vector into a diagonal matrix, Ws
Wx = np.zeros((len(m), len(m)))                # preaollocate array and enter values into matrix, Wx
for i in range(len(h)):
    j=i
    k=i+1
    Wx[i,j] = h[i]  
    Wx[i,k] = h[i]  
    
alpha_s=0.1                                    # Set alpha_s and alpha_x values  
alpha_x=0.15

# build Wm #
Wm=np.concatenate((alpha_s*Ws, alpha_x*Wx), axis=0)









    



---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-1-603b892fe3bc> in <module>()
      2 
      3 M=1000                      # Set number of model parameters
----> 4 x=np.linspace(0,1,M+1)      # Define 1D domain on nodes
      5 xc = 0.5*(x[1:] + x[0:-1])  # Define 1D domain on cell centers
      6 m = np.zeros(M)             # preallocate m array

NameError: name 'np' is not defined

Plotting the Tikhonov Curve

The next step is to make the $\phi_d$ vs. $\phi_m$ "Tikhonov" curve. We are going to do this by making a loop, starting with a $\beta$ value of 100000, and dividing it in half for each iteration. This will provide us with a picture of how $\phi_d$ and $\phi_m$ interact with each other as $\beta$ changes. We are going to do this by defining a lambda function to obtain the recovered model as before, and then store and plot the values for $\phi_d$ and $\phi_m$. Run the script below and note how there is a "trade off" for $\phi_d$ and $\phi_m$; that is, as $\phi_d$ gets smaller, $\phi_m$ gets larger.



In [52]:

    
# Make the Tikhonov Curve
beta = 100000                                  # Set beta value
itns = 10                                      # set maximum number of iterations
phdTik=np.zeros(itns)                          # preallocate arrays
phmTik=np.zeros(itns)
foo = np.zeros(itns)



# Get recovered model
mrecovered = lambda G,Wd,Wm,data,mref, beta: dot(np.linalg.inv(dot(dot(G.T,Wd.T),dot(Wd,G)) + beta*dot(Wm.T,Wm)),dot(dot(G.T,Wd),dot(Wd,dobs)) + beta*dot(dot(Wm.T,Wm),mref))

for i in range(itns):    
    mrec = mrecovered(G,Wd,Wm,dobs,mref,beta)
    phdTik[i] = L2(Wd, np.dot(G,mrec)-dobs)
    phmTik[i] = L2(Wm, mref-mrec)
    beta=beta*0.5


pylab.plot(phmTik,phdTik,'o')
pylab.xlabel('$\phi_m$')
pylab.ylabel('$\phi_d$')
pylab.title('Tikhonov Curve')









    Out[52]:





<matplotlib.text.Text at 0x13e74f98>

Interpreting the Tikhonov Curve: when $\phi_d$ is large

Let's consider how different $\beta$ values affect the recovered model that we obtain. We know from the statistics that the $\beta$ value that we seek should be when $\phi_d$ is equal to the number of data that we have. If the $\beta$ value we choose results in a value of $\phi_d$ that is large (that is, when we have a larger misfit), our predictions will look less like the observations. Also, the model regularization value will be small, and this means that the model will be "simpler" than the optimum model.



In [53]:

    
beta = 100000
mrec = mrecovered(G,Wd,Wm,dobs,mref,beta)

# Get residual of mref and mrec
phi_d2 = L2(Wd,np.dot(G,mrec)-dobs)
phi_m2 = L2(Wm, mref-mrec)


# Plot      
pylab.plot(xc,m)
pylab.plot(xc,mrec)
pylab.xlabel('x')
pylab.ylabel('m(x)')
pylab.title('Model, $m(x)$')









    Out[53]:





<matplotlib.text.Text at 0x14d21da0>

Interpreting the Tikhonov Curve: when $\phi_d$ is small

If we choose a value that produces a smaller misfit (when it is less than the statistical expected value) then the model regularization term will be large. That means the model is more complicated, or in geologic terms, there is more "structure" to the model, so it is likely that some of that "excess structure" is there to account for noise in the data. In other words, a model that causes too small a misfit value probably has excess, erroneous features.



In [57]:

    
beta = 50
mrec = mrecovered(G,Wd,Wm,dobs,mref,beta)

# Get residual of mref and mrec
phi_d2 = L2(Wd,np.dot(G,mrec)-dobs)
phi_m2 = L2(Wm, mref-mrec)


# Plot      
pylab.plot(xc,m)
pylab.plot(xc,mrec)
pylab.xlabel('x')
pylab.ylabel('m(x)')
pylab.title('Model, $m(x)$')









    Out[57]:





<matplotlib.text.Text at 0x15537860>

Bisection to find $\beta$

So how do we achieve our optimum value $\phi_d^*$? We are going to implement a variation of a bisection method to get our value. What we want is to be able to find the value for beta that corresponds to $\phi_d=\phi_d^*$. Of course we will not obtain exact equality, so what we will do is choose a tolerance value such that while the absolute value of the difference between $\phi_d$ and $\phi_d^*$ is greater than that value, our algorithm will keep searching. We start by assigning and initial value to $\beta$, and then check our tolerance condition. If it is not met, then we check if $\phi_d$ is greater than $\phi_d^*$. If it is, then we reduce $\beta$ by half and continue checking. Once we obtain a value for $\phi_d$ that is less than $\phi_d^*$, we are going to want to have our algorithm back up. We know the previous value for $\phi_d$ will be too large, and our current value will be too small, so we will pick the intermediate value of the two and then check in which region $\phi_d^*$ lies. From there we will proceed on "zeroing in" on our choice for beta until the desired tolerance value is reached. Running the code below finds the optimal $\beta$ value and prints out some of the details that occur inside the algorithm.



In [58]:

    
beta_0=1000000
beta=beta_0
tol=0.00001
i=1
itns=10
mrec = mrecovered(G,Wd,Wm,dobs,mref,beta)
phid = L2(Wd,np.dot(G,mrec)-dobs)
phi_dstar=20

while np.absolute(phid - phi_dstar)>tol:  
    j=i
    mrec = mrecovered(G,Wd,Wm,dobs,mref,beta)
    phid = L2(Wd,np.dot(G,mrec)-dobs)
    print("Iteration = ",j, "beta=", beta, "phid at start=",phid)
    if phid>phi_dstar:
        beta = beta_0*0.5**i
        print("beta=", beta)
    if phid<phi_dstar:
        beta_abv = beta_0*0.5**(i-2)
        beta_bel = beta_0*0.5**(i-1)
        beta_mid = 0.5*(beta_abv+beta_bel)
        mmid = mrecovered(G,Wd,Wm,dobs,mref,beta_mid)
        phidmid = L2(Wd,np.dot(G,mmid)-dobs)  
        #for k in range(itns):
        while np.absolute(phidmid - phi_dstar)>tol:
            if phidmid<phi_dstar:
                beta_abv = beta_0*0.5**(i-2)
                beta_bel = beta_mid
                beta_mid = 0.5*(beta_abv+beta_bel)
                beta=beta_mid
                mmid = mrecovered(G,Wd,Wm,dobs,mref,beta_mid)
                phidmid = L2(Wd,np.dot(G,mmid)-dobs)           
            if phidmid>phi_dstar:
                beta_abv = beta_mid
                beta_bel = beta_bel
                beta_mid = 0.5*(beta_abv+beta_bel)
                beta=beta_mid
                mmid = mrecovered(G,Wd,Wm,dobs,mref,beta_mid)
                phidmid = L2(Wd,np.dot(G,mmid)-dobs)                
                if phidmid<phi_dstar:
                    beta_abv = beta_abv
                    beta_bel = beta_mid
                    beta_mid = 0.5*(beta_abv+beta_bel)
                    beta=beta_mid
                mmid = mrecovered(G,Wd,Wm,dobs,mref,beta_mid)
                phidmid = L2(Wd,np.dot(G,mmid)-dobs)              
            phid=phidmid
            mrec=mmid
    i=i+1   
print("At end beta=", beta,"phid=", phid)









    



('Iteration = ', 1, 'beta=', 1000000, 'phid at start=', 663.19625035709964)
('beta=', 500000.0)
('Iteration = ', 2, 'beta=', 500000.0, 'phid at start=', 433.9080682235159)
('beta=', 250000.0)
('Iteration = ', 3, 'beta=', 250000.0, 'phid at start=', 233.84817140402461)
('beta=', 125000.0)
('Iteration = ', 4, 'beta=', 125000.0, 'phid at start=', 103.03342928144829)
('beta=', 62500.0)
('Iteration = ', 5, 'beta=', 62500.0, 'phid at start=', 38.784859850541906)
('beta=', 31250.0)
('Iteration = ', 6, 'beta=', 31250.0, 'phid at start=', 13.827954719583573)
('At end beta=', 40135.479042312734, 'phid=', 20.000009198137366)

This is what the recovered model looks like for our given $\beta$:



In [59]:

    
mrec = mrecovered(G,Wd,Wm,dobs,mref,beta)
phid = L2(Wd,np.dot(G,mrec)-dobs)
# Plot      
pylab.plot(xc,m)
pylab.plot(xc,mrec)
pylab.xlabel('x')
pylab.ylabel('m(x)')
pylab.title('Model, $m(x)$')









    Out[59]:





<matplotlib.text.Text at 0x151fdc88>

And this is what our predicted data will look like:



In [61]:

    
# Get predicted data

dpred = np.dot(G,mrec)

#pylab.plot(xd,r)
pylab.plot(xd,dpred,'o')
pylab.plot(xd,dobs)
pylab.plot(xd,d,'x')
pylab.xlabel('')
pylab.ylabel('d')
pylab.title('Predicted and Observed Data')









    Out[61]:





<matplotlib.text.Text at 0x15897b38>

Choosing $\alpha_S$ and $\alpha_x$

Next, let's consider the two parameters $\alpha_s$ and $\alpha_x$ and get an understanding of the units for each. Going back to the original expression for continuous functions we had:

$$ \phi_m = \alpha_s \int (m)^2 dx + \alpha_x \int \left( \frac{dm}{dx} \right)^2 dx $$

Knowing that $\phi_m$ is unitless, one can see that the units for $\alpha_s$ are $[1/L]$ and those for $\alpha_x$ are $[L]$. Taking the ratio of $\alpha_x$ over $\alpha_s$ gives:

$$ \frac{\alpha_x}{\alpha_s} = L^2 $$

or, with respect to $L$ in one spatial domain $x$, we write:

$$ L_x = \sqrt{\frac{\alpha_x}{\alpha_s}} $$

So a measure of the relative magnitudes of $\alpha_x$ and $\alpha_s$ can be determined by the length scale $L$, which ranges from the smallest spatial increment $\Delta x$ (assuming equal spacing between cells) to the total length of the domain $N \Delta x$

$$ \Delta x < L < N \Delta x $$

As this ratio becomes large, the model becomes smoother in the x-direction, and when it is larger than 1, structure in the model is penalized.



In [62]:

    
beta=5000
alpha_s=0.1                                    # Set alpha_s and alpha_x values  
alpha_x=0.3

L=sqrt(alpha_x/alpha_s)
print(L)
# build Wm #
Wm=np.concatenate((alpha_s*Ws, alpha_x*Wx), axis=0)
mrec = mrecovered(G,Wd,Wm,dobs,mref,beta)

# Plot      
pylab.plot(xc,m)
pylab.plot(xc,mrec)
pylab.xlabel('x')
pylab.ylabel('m(x)')
pylab.title('Model, $m(x)$')









    



1.73205080757






    Out[62]:





<matplotlib.text.Text at 0x15b6f4e0>

If this ratio is close to zero, then the smallest term dominates and models are rough.



In [63]:

    
beta=5000
alpha_s=0.1                                    # Set alpha_s and alpha_x values  
alpha_x=0.00000001

L=sqrt(alpha_x/alpha_s)
print(L)
# build Wm #
Wm=np.concatenate((alpha_s*Ws, alpha_x*Wx), axis=0)
mrec = mrecovered(G,Wd,Wm,dobs,mref,beta)

# Plot      
pylab.plot(xc,m)
pylab.plot(xc,mrec)
pylab.xlabel('x')
pylab.ylabel('m(x)')
pylab.title('Model, $m(x)$')









    



0.000316227766017






    Out[63]:





<matplotlib.text.Text at 0x15d4def0>

This concludes the third module. Now let's return to the workflow we looked at the beginning of this tutorial:

We have, for a very simple problem, covered the rudimentary ideas involved with performing an inversion. By creating a forward problem and adding noise to our data, we have simulated the "survey" aspect where we would collect data in the field. We built our kernel functions to model the physics of the problem and placed this information onto a one dimensional mesh. Next, we defined a data misfit and model regularization, then we articulated our inverse problem as a gradient-based optimization. Last, we performed our inversion to obtain a recovered model for our parameters and explored some of the aspects of choosing parameters to assist us in solving our problem.